Search CORE

Entropy-based High Performance Computation of Boolean SNP-SNP Interactions Using GPUs

Author: Pablo Moscato
Riveros Carlos
Ujaldon-Martinez Manuel
Publication venue
Publication date: 01/01/2014
Field of study

It is being increasingly accepted that traditional statistical Single Nucleotide Polymorphism (SNP) analysis of Genome-Wide Association Studies (GWAS) reveals just a small part of the heritability in complex diseases. Study of SNPs interactions identify additional SNPs that contribute to disease but that do not reach genome-wide significance or exhibit only epistatic effects. We have introduced a methodology for genome-wide screening of epistatic interactions which is feasible to be handled by state-of-art high performance computing technology. Unlike standard software, our method computes all boolean binary interactions between SNPs across the whole genome without assuming a particular model of interaction. Our extensive search for epistasis comes at the expense of higher computational complexity, which we tackled using graphics processors (GPUs) to reduce the computational time from several months in a cluster of CPUs to 3-4 days on a multi-GPU platform. Here, we contribute with a new entropy-based function to evaluate the interaction between SNPs which does not compromise findings about the most significant SNP interactions, but is more than 4000 times lighter in terms of computational time when running on GPUs and provides more than 100x faster code than a CPU of similar cost. We deploy a number of optimization techniques to tune the implementation of this function using CUDA and show the way to enhance scalability on larger data sets.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work was also supported by the Australian Research Council Future Fellowship to Prof. Moscato, by a funded grant from the ARC Discovery Project Scheme and by the Ministry of Education of Spain under Project TIN2006-01078 and mobility grant PR2011-0144. We also thank NVIDIA for hardware donation under CUDA Teaching and Research Center awards

Repositorio Institucional Universidad de Málaga

Graph algorithms for machine learning: a case-control study based on prostate cancer populations and high throughput transcriptomic data

Author: Langston Michael A
Moscato Pablo
Rogers Gary L
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background The continuing proliferation of high-throughput biological data promises to revolutionize personalized medicine. Confirming the presence or absence of disease is an important goal. In this study, we seek to identify genes, gene products and biological pathways that are crucial to human health, with prostate cancer chosen as the target disease. Materials and methods Using case-control transcriptomic data, we devise a graph theoretical toolkit for this task. It employs both innovative algorithms and novel two-way correlations to pinpoint putative biomarkers that classify unknown samples as cancerous or normal. Results and conclusion Observed accuracy on real data suggests that we are able to achieve sensitivity of 92% and specificity of 91%

University of Tennessee, Knoxville: Trace

Springer - Publisher Connector

Quantifying the regeneration of bone tissue in biomedical images via Legendre moments

Author: Berretta Regina
Lachiondo José Antonio
Moscato Pablo
Ujaldon-Martinez Manuel
Publication venue
Publication date: 01/01/2013
Field of study

Artículo publicado en los proceedings del congresoWe investigate the use of Legendre moments as biomarkers for an efficient and accurate classification of bone tissue on images coming from stem cell regeneration studies. Regions of either existing bone, cartilage or new bone-forming cells are characterized at tile level to quantify the degree of bone regeneration depending on culture conditions. Legendre moments are analyzed from three different perspectives: (1) their discriminant properties in a wide set of preselected vectors of features based on our clinical and computational experience, providing solutions whose accuracy exceeds 90%. (2) the amount of information to be retained when using Principal Component Analysis (PCA) to reduce the dimensionality of the problem from 2 to 6 dimensions. (3) the use of the (alpha-beta)-k-feature set problem to identify a k=4 number of features which are more relevant to our analysis from a combinatorial optimization approach. These techniques are compared in terms of computational complexity and classification accuracy to assess the strengths and limitations of the use of Legendre moments for this biomedical image processing application.Universidad de Málaga, Campus de Excelencia Internacional Andalucía Tech

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

Hierarchical Clustering Using the Arithmetic-Harmonic Cut: Complexity and Experiments

Author: Luke Mathieson
Pablo Moscato
Pritha Mahata
Romeo Rizzi
Vladimir Brusic
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Clustering, particularly hierarchical clustering, is an important method for understanding and analysing data across a wide variety of knowledge domains with notable utility in systems where the data can be classified in an evolutionary context. This paper introduces a new hierarchical clustering problem defined by a novel objective function we call the arithmetic-harmonic cut. We show that the problem of finding such a cut is -hard and -hard but is fixed-parameter tractable, which indicates that although the problem is unlikely to have a polynomial time algorithm (even for approximation), exact parameterized and local search based techniques may produce workable algorithms. To this end, we implement a memetic algorithm for the problem and demonstrate the effectiveness of the arithmetic-harmonic cut on a number of datasets including a cancer type dataset and a corona virus dataset. We show favorable performance compared to currently used hierarchical clustering techniques such as -Means, Graclus and Normalized-Cut. The arithmetic-harmonic cut metric overcoming difficulties other hierarchal methods have in representing both intercluster differences and intracluster similarities

Public Library of Science (PLOS)

OPUS - University of Technology Sydney

Catalogo dei prodotti della ricerca

Macquarie University ResearchOnline

Uncovering Molecular Biomarkers That Correlate Cognitive Decline with the Changes of Hippocampus' Gene Expression Profiles in Alzheimer's Disease

Author: Berretta Regina
Gómez Ravetti Martín
Moscato Pablo
Rosso Osvaldo A.
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Alzheimer’s disease (AD) is characterized by a neurodegenerative progression that alters cognition. On a phenotypical level, cognition is evaluated by means of the MiniMental State Examination (MMSE) and the post-morten examination of Neurofibrillary Tangle count (NFT) helps to confirm an AD diagnostic. The MMSE evaluates different aspects of cognition including orientation, short-term memory (retention and recall), attention and language. As there is a normal cognitive decline with aging, and death is the final state on which NFT can be counted, the identification of brain gene expression biomarkers from these phenotypical measures has been elusive. Methodology/Principal Findings: We have reanalysed a microarray dataset contributed in 2004 by Blalock et al. of 31 samples corresponding to hippocampus gene expression from 22 AD subjects of varying degree of severity and 9 controls. Instead of only relying on correlations of gene expression with the associated MMSE and NFT measures, and by using modern bioinformatics methods based on information theory and combinatorial optimization, we uncovered a 1,372-probe gene expression signature that presents a high-consensus with established markers of progression in AD. The signature reveals alterations in calcium, insulin, phosphatidylinositol and wnt-signalling. Among the most correlated gene probes with AD severity we found those linked to synaptic function, neurofilament bundle assembly and neuronal plasticity. Conclusions/Significance: A transcription factors analysis of 1,372-probe signature reveals significant associations with the EGR/KROX family of proteins, MAZ, and E2F1. The gene homologous of EGR1, zif268, Egr-1 or Zenk, together with other members of the EGR family, are consolidating a key role in the neuronal plasticity in the brain. These results indicate a degree of commonality between putative genes involved in AD and prion-induced neurodegenerative processes that warrants further investigation

Public Library of Science (PLOS)

Biblioteca Digital Biblioteca Digital de la Facultad de Ciencias Exactas y Naturales de la Universidad de Buenos Aires (Biblioteca Digital FCEN-UBA)

A Kernelisation Approach for Multiple d-Hitting Set and Its Application in Optimal Multi-Drug Therapeutic Combinations

Author: Drew Mellor
Elena Prieto
Luke Mathieson
Maria A. Deli
Pablo Moscato
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Therapies consisting of a combination of agents are an attractive proposition, especially in the context of diseases such as cancer, which can manifest with a variety of tumor types in a single case. However uncovering usable drug combinations is expensive both financially and temporally. By employing computational methods to identify candidate combinations with a greater likelihood of success we can avoid these problems, even when the amount of data is prohibitively large. Hitting Set is a combinatorial problem that has useful application across many fields, however as it is NP-complete it is traditionally considered hard to solve exactly. We introduce a more general version of the problem (α,β,d)-Hitting Set, which allows more precise control over how and what the hitting set targets. Employing the framework of Parameterized Complexity we show that despite being NP-complete, the (α,β,d)-Hitting Set problem is fixed-parameter tractable with a kernel of size O(αdkd) when we parameterize by the size k of the hitting set and the maximum number α of the minimum number of hits, and taking the maximum degree d of the target sets as a constant. We demonstrate the application of this problem to multiple drug selection for cancer therapy, showing the flexibility of the problem in tailoring such drug sets. The fixed-parameter tractability result indicates that for low values of the parameters the problem can be solved quickly using exact methods. We also demonstrate that the problem is indeed practical, with computation times on the order of 5 seconds, as compared to previous Hitting Set applications using the same dataset which exhibited times on the order of 1 day, even with relatively relaxed notions for what constitutes a low value for the parameters. Furthermore the existence of a kernelization for (α,β,d)-Hitting Set indicates that the problem is readily scalable to large datasets

Public Library of Science (PLOS)

OPUS - University of Technology Sydney

Macquarie University ResearchOnline

Iteratively refining breast cancer intrinsic subtypes in the METABRIC dataset

Author: Carlos Riveros
Heloisa H. Milioli
Inna Tishchenko
Pablo Moscato
Regina Berretta
Renato Vimieiro
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

BACKGROUND: Multi-gene lists and single sample predictor models have been currently used to reduce the multidimensional complexity of breast cancers, and to identify intrinsic subtypes. The perceived inability of some models to deal with the challenges of processing high-dimensional data, however, limits the accurate characterisation of these subtypes. Towards the development of robust strategies, we designed an iterative approach to consistently discriminate intrinsic subtypes and improve class prediction in the METABRIC dataset. FINDINGS: In this study, we employed the CM1 score to identify the most discriminative probes for each group, and an ensemble learning technique to assess the ability of these probes on assigning subtype labels using 24 different classifiers. Our analysis is comprised of an iterative computation of these methods and statistical measures performed on a set of over 2000 samples. The refined labels assigned using this iterative approach revealed to be more consistent and in better agreement with clinicopathological markers and patients' overall survival than those originally provided by the PAM50 method. CONCLUSIONS: The assignment of intrinsic subtypes has a significant impact in translational research for both understanding and managing breast cancer. The refined labelling, therefore, provides more accurate and reliable information by improving the source of fundamental science prior to clinical applications in medicine